New paper: crime prediction using Twitter data in San Jose

2 minutes read

GeoFly Lab published a new study on crime risk prediction that combines two information sources: (1) historical crime records and (2) crime-related geo-tagged Twitter signals. The work focuses on improving when and where elevated risk is likely to occur, to support more informed resource planning.

This research was conducted in collaboration with Penn State University.

Crime prediction using historical crime and geo-tagged Twitter data

Many widely used approaches treat crime prediction as a single-variable problem or do not fully account for space and time together. In this paper, we use a spatio-temporal co-kriging framework (ST-Cokriging) that integrates multiple inputs to produce smoother and more consistent risk surfaces.

We evaluate performance across three common categories: street crime, property crime, and vehicle crime, with experiments comparing weekdays vs. weekends. Results show that adding social-media signals improves agreement with observed patterns and reduces prediction error (e.g., lower RMSE) compared with a model using crime history alone.

In practical terms, the study suggests that combining official records with timely, location-aware public signals can help identify risk hotspots more reliably—an important step toward proactive, data-informed crime prevention.

Updated: December 30, 2025